# Systolic Architecture for Realization of Two Dimensional Discrete Hartley Transform

M.Narayan Murty<sup>1</sup> and H.Panda<sup>2</sup>

<sup>1</sup>(Visiting Faculty, Department of Physics, Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam – 515134, Andhra Pradesh, India) <sup>2</sup>(Department of physics, S.K.C.G. Autonomous College, Paralakhemundi -761200, Odisha, India)

**Abstract:** Discrete Hartley transform is an important tool in digital signal processing. This paper presents an algorithm for realization of two dimensional discrete Hartley transform (2-D DHT) of size  $N \times N$  using systolic architecture. The 2-D DHT can be realized in two stages. For computation of these two stages, two systolic architectures are presented for realization of 2-D DHT of size  $3 \times 3$ . These two systolic architectures can be orthogonally interspersed together for realization 2-D DHT without requiring any transposition buffer. The number of processing elements (PEs) used in the combined structure is  $2N^2$ . Each PE consists of one multiplier, one adder and one register.

Keywords: Discrete Hartley transform, discrete Fourier transform, systolic architecture.

# I. Introduction

The discrete Hartley transform (DHT) [1], [2] plays an important role in many digital signal processing (DSP) applications since it is a good alternative to the discrete Fourier transform (DFT) for its real-number operations. One of the main attractions of DHT is that it only involves real computations in contrast to complex computations in the DFT. In addition, the inverse DHT has the same form as the forward DHT, except for a scaling factor. Therefore, a single kind of program or architecture can be used to carry out both the forward and inverse DHT computations.

Over the years, the DHT has been established as a potential tool for signal processing and communication applications, e.g., computation of circular convolution, and deconvolution [3], [4], interpolation of real-valued signals [5], image compression [6], [7], error control coding [8], adaptive filtering [9], multi-carrier modulation and many other applications [10]-[12]. Fast implementation of one-dimensional (1-D DHT) has attracted many attentions [13]-[15]. However, DHT is computation intensive.

Several algorithms for the fast computation of multi-dimensional DHT are available in the literature. But the 2D-DHT is the most popular among the DHTs of higher dimensions due to its applications in various signal processing and image processing applications [16]-[19].

This paper presents an algorithm for realization of two dimensional discrete Hartley transform (2-D DHT) of size  $N \times N$  using systolic architecture. The 2-D DHT can be realized in two stages. For computation of these two stages, two systolic architectures are presented for realization of 2-D DHT of size  $3 \times 3$ . These two systolic architectures can be interspersed together without the transposition buffer. The systolic architecture has the following characteristics:

- A massive and non-centralized parallelism
- Local communications
- Synchronous evaluation

The systolic arrays are used in the design and implementation of high performance digital signal processing equipment. Systolic architectures are established as the most popular and dominant class of VLSI structures due to the simplicity of their processing elements (PEs), modularity of their structure, regular and nearest neighbour interconnections between the PEs, High level of pipelinability, small chip area and lower dissipation. In the systolic architectures, the desired data are pumped rhythmically in regular intervals across the PEs for yielding high throughput by fully pipelined processing. The systolic array concept can also be exploited at bit level in the design of individual VLSI chips. The highly regular structure of systolic circuits renders them comparatively easy to design and test.

The rest of the paper is organized as follows. The proposed algorithm for 2-D DHT is presented in Section-II. The systolic architecture for implementation of 2-D DHT of size  $3\times3$  is presented in Section-III. Conclusion is given in Section-IV.

### **II.** Proposed Algorithm For 2-D DHT

The one dimensional discrete Hartley transform (1-D DHT) for real input data sequence  $\{x(m); m = 0, 1, 2, ..., N-1\}$  of length N is defined as

$$H(k) = \sum_{m=0}^{N-1} x(m) cas\left(\frac{2\pi km}{N}\right)$$
for k = 0, 1, 2, ..., N - 1.
(1)

Where  $cas\left(\frac{2\pi km}{N}\right)$  is the transform's kernel and  $cas\theta = \cos\theta + \sin\theta$ . The *H* values represent the transformed data. The 1-D DHT defined by (1) can be extended to two dimensional discrete Hartley transform

(2-D DHT). The 2-D DHT of input data array x(m, n) of size  $N \times N$  is given by

$$H(k,l) = \sum_{m=0}^{N-1} \sum_{n=0}^{N-1} x(m,n) cas\left(\frac{2\pi km}{N}\right) cas\left(\frac{2\pi l n}{N}\right)$$
(2)  
for k, l = 0, 1, 2, ..., N - 1.

Defining  $C_N^{ij} = cas\left(\frac{2\pi ij}{N}\right)$ , (2) can be written as

$$H(k,l) = \sum_{m=0}^{N-1} \sum_{n=0}^{N-1} x(m,n) \mathcal{C}_{N}^{km} \mathcal{C}_{N}^{ln}$$
(3)

Define

$$T(k,n) = \sum_{m=0}^{N-1} x(m,n) C_N^{km}$$
(4)

Substituting (4) in (3), we have

$$H(k,l) = \sum_{n=0}^{N-1} T(k,n) C_N^{ln} \qquad \text{for } k, l = 0, 1, 2, ..., N-1.$$
(5)

The 2-D DHT given by (5) can be realized by the following two steps.

Step 1: computation of T(k,n) given by (4) using a systolic architecture.

37 1

Step 2: computation of H(k, l) given by (5) using a systolic architecture similar to that used in step 1.

An example for realizing the 2-D DHT of size  $N \times N = 3 \times 3$  is given in the next section.

## III. Systolic Architecture For Implementation of 2-D DHT of Size 3×3

To clarify the proposed algorithm, a 2-D DHT of size 3×3 is considered. **3.1 Step 1:** 

Substituting n = 0, 1, 2 successively in (4), we obtain the following three expressions for N = 3 & k = 0.

$$T(0,0) = x(0,0)_{C_3}^{00} + x(1,0)_{C_3}^{01} + x(2,0)_{C_3}^{02}$$
(6)

$$T(0,1) = x(0,1)_{C_3}^{00} + x(1,1)_{C_3}^{01} + x(2,1)_{C_3}^{02}$$
(7)

$$T(0,2) = x(0,2)_{\mathcal{C}_{3}}^{00} + x(1,2)_{\mathcal{C}_{3}}^{01} + x(2,2)_{\mathcal{C}_{3}}^{02}$$
(8)

Substituting N = 3, k = 1 & n = 0, 1, 2 in (4), we have

$$T(1,0) = x(0,0)_{\mathcal{C}_{3}}^{10} + x(1,0)_{\mathcal{C}_{3}}^{11} + x(2,0)_{\mathcal{C}_{3}}^{12}$$
(9)

$$T(1,1) = x(0,1)_{\mathcal{C}_{3}^{10}} + x(1,1)_{\mathcal{C}_{3}^{11}} + x(2,1)_{\mathcal{C}_{3}^{12}}$$
(10)

$$T(1,2) = x(0,2)C_3^{10} + x(1,2)C_3^{11} + x(2,2)C_3^{12}$$
(11)

Similarly, substituting N = 3, k = 2 & n = 0, 1, 2 in (4), we get

$$T(2,0) = x(0,0)_{\mathcal{C}_{3}}^{20} + x(1,0)_{\mathcal{C}_{3}}^{21} + x(2,0)_{\mathcal{C}_{3}}^{22}$$
(12)

$$T(2,1) = x(0,1)C_{3}^{20} + x(1,1)C_{3}^{21} + x(2,1)C_{3}^{22}$$
(13)

$$T(2,2) = x(0,2)\mathcal{C}_{3}^{20} + x(1,2)\mathcal{C}_{3}^{21} + x(2,2)\mathcal{C}_{3}^{22}$$
(14)

The nine relations from (6) to (14) are realized using the systolic architecture shown in Figure 1. The symbol '•'denotes a delay element. This architecture consists of  $N^2 = 9$  identical PEs. Each PE consists of one multiplier, one adder and one register for storing  $c_N^{km}$ . The function of each PE is shown in Figure 3. When the input data x(m,n) moves down the architecture, T(k,n) given by the relations from (6) to (14) will be generated as shown in Fig. 1.



Fig.1. Systolic architecture for computation of T(k, n) given by (4) for N = 3; k = 0, 1, 2; n = 0, 1, 2.

#### 3.2 Step 2:

Substituting l = 0, 1, 2 successively in (5), we get the following three expressions for N = 3 & k = 0.

$$H(0,0) = T(0,0)_{\mathcal{C}_{3}^{00}} + T(0,1)_{\mathcal{C}_{3}^{01}} + T(0,2)_{\mathcal{C}_{3}^{02}}$$
(15)

$$H(0,1) = T(0,0)_{\mathcal{C}_{3}}^{10} + T(0,1)_{\mathcal{C}_{3}}^{11} + T(0,2)_{\mathcal{C}_{3}}^{12}$$
(16)

$$H(0,2) = T(0,0)_{\mathcal{C}_{3}^{20}} + T(0,1)_{\mathcal{C}_{3}^{21}} + T(0,2)_{\mathcal{C}_{3}^{22}}$$
(17)

Substituting N = 3, k = 1 & l = 0, 1, 2 in (5), we obtain

$$H(1,0) = T(1,0)_{C_3}^{00} + T(1,1)_{C_3}^{01} + T(1,2)_{C_3}^{02}$$
(18)

$$H(1,1) = T(1,0)_{\mathcal{C}_{3}^{10}} + T(1,1)_{\mathcal{C}_{3}^{11}} + T(1,2)_{\mathcal{C}_{3}^{12}}$$
(19)

$$H(1,2) = T(1,0)C_{3}^{20} + T(1,1)C_{3}^{21} + T(1,2)C_{3}^{22}$$
(20)

Similarly, substituting N = 3, k = 2 & l = 0, 1, 2 in (5), we have

$$H(2,0) = T(2,0)_{\mathcal{C}_{3}}^{00} + T(2,1)_{\mathcal{C}_{3}}^{01} + T(2,2)_{\mathcal{C}_{3}}^{02}$$
(21)

$$H(2,1) = T(2,0)_{C_3}^{10} + T(2,1)_{C_3}^{11} + T(2,2)_{C_3}^{12}$$
(22)

$$H(2,2) = T(2,0)_{\mathcal{C}_{3}}^{20} + T(2,1)_{\mathcal{C}_{3}}^{21} + T(2,2)_{\mathcal{C}_{3}}^{22}$$
(23)

The 2D-DHT of size  $3\times3$  is realized by computing the expressions (15) to (23) using the systolic architecture shown in Figure 2. The output data generated in Figure 1 enters the systolic architecture shown in Figure 2. The symbol '•'denotes a delay element. This architecture consists of  $N^2 = 9$  identical PEs. Each PE consists of one multiplier, one adder and one register for storing  $c_N^{ln}$ . The function of each PE is shown in Figure 3. When the data T(k,n) moves down the architecture, the output 2-D DHT components H(k,l) given by the expressions from (15) to (23) will be generated as shown in Fig. 2.



The two systolic architectures shown in Fig. 1 & 2 are orthogonally interspersed together without any transposition buffer for realization of 2-D DHT. During each clock cycle, each PE performs one multiplication and one addition. The duration of one cycle period is  $T = T_M + T_A$ , where  $T_M$  and  $T_A$  are, respectively, the times involved in performing one multiplication and one addition in the PE.

Table 1 gives the comparison of the computation complexities of the proposed architecture with systolic architectures of 2-D DFT. The number of multipliers of proposed architecture is less than that of [20] and [21]. The number of adders of proposed architecture is less than that of [21]. The cycle period of this systolic architecture is less than that of [20].

| Table1. Comparison of computation complexities |             |                 |                |
|------------------------------------------------|-------------|-----------------|----------------|
| Structure                                      | Multipliers | Adders          | Cycle-time (T) |
| Proposed                                       | $2N^2$      | $2N^2$          | $T_M + T_A$    |
| 2-D DFT [20]                                   | $4N^2$      | $2N^2$          | $T_M + 2T_A$   |
| 2-D DFT [21]                                   | 2N(2N+1)    | $4N^2 + 6N - 4$ | $T_M + T_A$    |

Table1. Comparison of computation complexities

## **IV. Conclusion**

This paper presents two systolic architectures for realization of 2-D DHT of size  $N \times N$ . These two systolic architectures can be orthogonally interspersed together for realization 2-D DHT without requiring any transposition buffer. The number of processing elements (PEs) used in the combined structure is  $2N^2$ . Each PE consists of one multiplier, one adder and one register. So,  $2N^2$  multipliers and equal numbers of adders are used for realization of 2-D DHT. The computation complexities of this systolic architecture are compared with systolic architectures of 2-D DFT [20] and [21]. This comparison shows that the proposed architecture is better

than those two systolic architectures using 2-D DFT. The systolic architecture utilizes parallel structures to achieve high speed performance. The highly regular structure of systolic circuits renders them comparatively easy to design and test. The systolic architecture is suitable for VLSI implementation.

#### References

- R. N. Bracewell, "Discrete Hartley transform," J. Opt. Soc. Amer., vol. 73, no. 12, pp. 1832-1835, 1983. [1].
- [2]. R. N. Bracewell, "The fast Hartley transform," Proc. IEEE, vol. 72, pp. 1010-1018, 1984.
- R. N. Bracewell, The Fourier Transform and its Applications, 3<sup>rd</sup> ed., New York: McGraw-Hill, 2000. [3].
- [4]. L. -Z. Cheng, L. Tong, and Z. -R. Jiang, "Real transform algorithm for computing discrete circular deconvolution," in Proc. 3rd IEEE Int. Conf. Signal Process., vol. 1, pp. 166-169, Oct. 1996.
- [5]. G. S. Maharana and P. K. Meher, "Algorithm for efficient interpolation of real-valued signals using discrete Hartley transform," Comp. Elect. Eng., vol. 23, no. 2, pp. 129-134, Mar. 1997.
- [6]. K. S. Tzou and T. R. Hsing, "A study of discrete Hartley transform for image compression application," in Proc. SPIE Int. Soc. Opt. Eng., vol. 534, Jan. 1985.
- C. H. Paik and M. D. Fox, "Fast Hartley transform for image processing," IEEE Trans. Med. Imag., vol. 7, no. 6, pp. 149-153, Jun. [7]. 1988.
- J. -L. Wu and J. Shiu, "Discrete Hartley transform in error control coding," IEEE Trans. Signal Process., vol. 39, no. 10, pp. 2356-[8]. 2359, Oct. 1991.
- [9]. P. K. Meher and G. Panda, "Unconstrained Hartley-domain least mean square adaptive filter," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 40, no. 9, pp. 582-585, Sep. 1993.
- C. -L. Wang, C. -H. Chang, J. L. Fan, and J. M. Cioffi, "Discrete Hartley transform based multicarrier modulation," in Proc. IEEE [10]. Int. Conf. Acoust., Speech, Signal Process. (ICASSP'00), vol. 5, Jun. 2000, pp. 2513-2516.
- R. N. Bracewell, "Aspects of the Hartley transform," Proc. IEEE, vol. 82, no. 3, pp. 381-387, Mar. 1994. [11].
- C. -L. Wang, C. -H. Chang, "A DHT-based FFT/IFFT processor for VDSL transceivers," in Proc. IEEE Int. Conf. Acoust., Speech, [12]. Signal Process. (ICASSP'01), vol. 2, May 2001, pp. 1213-1216.
- S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, "A New split-radix FHT algorithm for length q\*2"," IEEE Trans. Circuits [13]. Syst. I, Reg. Papers, vol. 51, no. 10, pp. 2031-2043, Oct. 2004. P. K. Meher, T. Srikanthan, and J. C. Patra, "Scalable and modular memory-based systolic architectures for discrete Hartley
- [14]. transform," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, no. 5, pp. 1065-1077, May 2006.
- P. K. Meher, J. C. Patra, and M. N. S. Swamy, "High-throughput memory-based architecture for DHT using a new convolutional [15]. formulation," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 7, pp. 606-610, Jul. 2007.
- [16]. M.Perkins, "A separable Hartley-like transform in two or more dimensions", Proceedings of the IEEE, vol.75, no.8, pp. 1127-1129, 1987
- [17]. A.B.Watson and A.Poirson, "Separable two-dimensional discrete Hartley transform", Journal of Optical Society of America, vol.3, no.12, pp. 2001-2004, 1986.
- L.Tao, H.Kwan, and J.J.Gu, "Filterbank -based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform", [18]. IEEE International Symposium on Circuits and Systems, Brazil, pp. 1512-1515, May 2011.
- L.Tao and H.Kwan, "Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform", IEEE [19]. Transactions on Image Processing, vol.21, no.7, pp.3306-3311, 2012.
- P.K.Meher, "Design of a fully-pipelined systolic array for flexible transposition-free VLSI of 2-D DFT", IEEE Trans. Circuits Syst. [20]. II: Express Briefs, vol.52, no.2, pp. 85-89, Feb.2005.
- [21]. P.K.Meher, J.C.Patra, and A.P.Vinod, APCCAS 2006, IEEE, pp. 1927-1930.